NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MetaStore: Analyzing Deep Learning Meta-Data at Scale

Zhang, H; Yan, B; Cao, L; Madden, S; Rundensteiner, E (May 2024, Proceedings of the VLDB Endowment)

The process of training deep learning models produces a huge amount of meta-data, including but not limited to losses, hidden feature embeddings, and gradients. Model diagnosis tools have been developed to analyze losses and feature embeddings with the aim to improve the performance of these models. However, gradients, despite carrying rich information that is potentially relevant for model interpretation and data debugging, have yet to be fully explored due to their size and complexity. Each single gradient has a size as large as the number of parameters of the neural net - often measured in the tens of millions. This makes it extremely challenging to efficiently collect, store, and analyze large numbers of gradients in these models. In this work, we develop MetaStore to fill this gap. MetaStore leverages our observation that storing certain compact intermediate results produced in the back propagation process, namely, the prefix and suffix gradients, is sufficient for the exact restoration of the original gradient. These prefix and suffix gradients are much more compact than the original gradients, thus allowing us to address the gradient collection and storage challenges. Furthermore, MetaStore features a rich set of analytics operators that allow the users to analyze the gradients for data debugging or model interpretation. Rather than first having to restore the original gradients and then run analytics on top of this decompressed view, MetaStore directly executes these operators on the compact prefix and suffix structures, making gradient-based analytics efficient and scalable. Our experiments on popular deep learning models such as VGG, BERT, and ResNet and benchmark image and text datasets demonstrate that MetaStore outperforms strong baseline methods from 4 to 678x in storage costs and from 2 to 1000x in running time.
more » « less
Full Text Available
LMC+ : Large-scale mapping of [C II] and [O III] in the LMC molecular ridge: I. Dataset and line ratio analyses

https://doi.org/10.1051/0004-6361/202555833

Fischer, C; Madden, S C; Krabbe, A; Polles, F L; Fadda, D; Tarantino, E; Galliano, F; Chen, C-H R; Abel, N; Beck, Á; et al (October 2025, Astronomy & Astrophysics)

Context. The fundamental process of star formation in galaxies involves the intricate interplay between the fueling of star formation via molecular gas and the feedback from recently formed massive stars that can, in turn, hinder the conversion of gas into stars. This process, by which galaxies evolve, is also closely connected to the intrinsic properties of the interstellar medium (ISM), such as structure, density, pressure, and metallicity. Aims. To study the role that different molecular and atomic phases of the ISM play in star formation, and to characterize their physical conditions, we zoom into our nearest neighboring galaxy, the Large Magellanic Cloud (LMC; 50 kpc), the most convenient laboratory in which to study the effects of the lower metal abundance on the properties of the ISM. The LMC offers a view of the ISM and star formation conditions in a low-metallicity (Z~ 0.5 Z_⊙) environment similar, in that regard, to the epoch of the peak of star formation in the earlier Universe (z~ 1.5). Following up on studies carried out at galactic scales in low-Z galaxies, we present an unprecedentedly detailed analysis of well-known star-forming regions (SFRs) at a spatial resolution of a few parsecs. Methods. We mapped a 610pc× 260pc region in the LMC molecular ridge in [C II]λ158 µm and the [O III]λ88 µm using the FIFI-LS instrument on the SOFIA telescope. We compared the data with the distribution of the CO(2−1) emission from ALMA, the modeled total infrared luminosity, and the Spitzer/MIPS 24 µm continuum and Hα. Results. We present new large maps of [CII] and [OIII] and perform a first comparison with CO(2−1) line and LTIR emission. We also provide a detailed description of the observing strategy with SOFIA/FIFI-LS and the data reduction process. Conclusions. We find that [CII] and [OIII] emission is associated with the SFRs in the molecular ridge, but also extends throughout the mapped region, and is not obviously associated with ongoing star formation. The CO emission is clumpier than the [C II] emission and we find plentiful [C II] present where there is little CO emission, possibly holding important implications for “CO-dark” gas. We find a clear trend of the L_[C II]/L_TIRratio decreasing with increasing L_TIRin the full range. This suggests a strong link between the “[C II]-deficit” and the local physical conditions instead of global properties.
more » « less
Free, publicly-accessible full text available October 1, 2026
Scalable Kernel Density Estimation-based Local Outlier Detection over Large Data Streams

Qin, X.; Cao, L.; Rundensteiner, EA.; and Madden, S. (March 2019, Proceedings of the 22nd International Conference on Extending Database Technology (EDBT))

Local outlier techniques are known to be effective for detecting outliers in skewed data, where subsets of the data exhibit diverse distribution properties. However, existing methods are not well equipped to support modern high-velocity data streams due to the high complexity of the detection algorithms and their volatility to data updates. To tackle these shortcomings, we propose local outlier semantics that operate at an abstraction level by leveraging kernel density estimation (KDE) to effectively detect local outliers from streaming data. A strategy to continuously detect top-N KDE-based local outliers over streams is designed, called KELOS – the first linear time complexity streaming local outlier detection approach. The first innovation of KELOS is the abstract kernel center-based KDE (aKDE) strategy. aKDE accurately yet efficiently estimates the data density at each point – essential for local outlier detection. This is based on the observation that a cluster of points close to each other tend to have a similar influence on a target point’s density estimation when used as kernel centers. These points thus can be represented by one abstract kernel center. Next, the KELOS’s inlier pruning strategy early prunes points that have no chance to become top-N outliers. This empowers KELOS to skip the computation of their data density and of the outlier status for every data point. Together aKDE and the inlier pruning strategy eliminate the performance bottleneck of streaming local outlier detection. The experimental evaluation demonstrates that KELOS is up to 6 orders of magnitude faster than existing solutions, while being highly effective in detecting local outliers from streaming data.
more » « less
Full Text Available

Search for: All records